Search CORE

159 research outputs found

Discriminative and informative features for biomolecular text mining with ensemble feature selection

Author: Reverter
S. Van Landeghem
T. Abeel
Y. Saeys
Y. Van de Peer
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results

CiteSeerX

Crossref

Ghent University Academic Bibliography

PubMed Central

Highlights from the 6th International Society for Computational Biology Student Council Symposium at the 18th Annual International Conference on Intelligent Systems for Molecular Biology

Author: Christiaan Klijn
F Xin
G Macintyre
H Hettling
J Behr
J Larson
M McDowall
Magali Michaut
MP Magariños
P Surendran
P Vanhee
S Banton
S Carmona
S Shah
T Abeel
Thomas Abeel
XF Li
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

This meeting report gives an overview of the keynote lectures and a selection of the student oral and poster presentations at the 6th International Society for Computational Biology Student Council Symposium that was held as a precursor event to the annual international conference on Intelligent Systems for Molecular Biology (ISMB). The symposium was held in Boston, MA, USA on July 9th, 2010

University of Toronto Research Repository

Crossref

TU Delft Repository

Springer - Publisher Connector

PubMed Central

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Author: A Ivshina
Anne-Claire Haury
C Ambroise
C Fan
C Lai
C Sotiriou
C Sotiriou
F Reyal
G Abraham
H Zou
I Guyon
I Guyon
J Bi
J Mairal
J Wang
Jean-Philippe Vert
JPA Ioannidis
L Ein-Dor
L Ein-Dor
M Dai
Muy-Teck Teh
N Meinshausen
P Wirapati
Pierre Gestraud
R Kohavi
R Shen
R Simon
R Tibshirani
RA Irizarry
S Michiels
T Abeel
T Barrett
T Iwamoto
W Shi
Y Benjamini
Y Pawitan
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/06/2011
Field of study

Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

HAL-MINES ParisTech

The impact of sequence length and number of sequences on promoter prediction performance

Author: C Cortes
D Dineen
J Han
J Zeng
JR Landis
K Florquin
L Breiman
Luiz H de C Merschmann
M Kuhn
N Japkowicz
P Baldi
P Meysman
R Yamashita
Renata Guerra-Sá
S Carvalho
Sávio G Carvalho
T Abeel
T Abeel
T Abeel
TM Cover
U Ohler
V Grishkevich
Y Gan
Y Gan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

Author: A Bird
A Marson
A Rodriguez
A Sandelin
A Sandelin
AP Bird
Arindam Bhattacharjee
Ben Gordon
CD Schmid
Christopher K. Patil
D Karolchik
David L. Corcoran
DL Corcoran
DP Bartel
DS Prestridge
DS Prestridge
E Wingender
F Ozsolak
GD Stormo
GG Loots
GM Borchert
H Wakaguri
HJ Bussemaker
HK Saini
I Rigoutsos
IP Ioshikhes
J Taylor
J van Helden
K Woods
KD Taganov
Kusum V. Pandit
M Gardiner-Garden
M Megraw
MJ Buck
MP Brown
N Liu
Naftali Kaminski
NJ Martinez
O Chapelle
P Carninci
P Jin
Panayiotis V. Benos
R Gangal
R Shalgi
RM Kuhn
S Baskerville
S Fujita
S Mahony
S Mahony
SJ Cooper
T Abeel
T Thum
T Wang
TA Down
U Ohler
U Ohler
WJ Kent
X Zhao
X Zhou
Y Lee
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2009
Field of study

Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles

Author: Aerts
Bajic
Bajic
Bajic
Baldi
Brent
Carninci
Chen
Choi
Davuluri
Deng
Down
Fickett
Florquin
Goni
Gross
Kanhere
Kawaji
Knudsen
Liolios
P. Rouze
Pedersen
Ponger
Prestridge
Reese
Sandelin
Scherf
Sonnenburg
T. Abeel
Wang
Wang
Won
Y. Saeys
Y. Van de Peer
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work

Crossref

Ghent University Academic Bibliography

PubMed Central

Comparative analysis of mycobacterium and related actinomycetes yields insight into the evolution of mycobacterium tuberculosis pathogenesis

Author: Abeel Thomas
Dolganov Gregory
Galagan James
Iacobelli-Martinez Milena
Kidd Matthew J
Koehrsen Mike
Maer Andreia M
McGuire Abigail Manson
Park Sang Tae
Peterson Matthew
Raman Sahadevan
Regev Aviv
Riley Robert
Schoolnik Gary K
Sisk Peter
Stolte Christian
Wapinski Ilan
Weiner Brian
White Jared
Yamamoto Robert T
Zucker Jeremy
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The sequence of the pathogen <it>Mycobacterium tuberculosis </it>(<it>Mtb</it>) strain <it>H37Rv </it>has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other <it>Mtb </it>strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of <it>Mtb </it>pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of <it>Mtb </it>and <it>M. bovis</it>, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, <it>Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes</it>, and <it>Bifidobacterium longum</it>. Results Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four. Conclusions Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of <it>Mtb </it>in a comparative context, and are available online and through TBDB.org.</p

DSpace@MIT

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central